Rank | Count | Beginning |
---|---|---|
4885 | 677 | Le |
5504 | 649 | Les |
4213 | 505 | La |
3406 | 426 | Il |
2661 | 313 | En |
7084 | 256 | Nous |
7842 | 222 | Pour |
9767 | 198 | Vous |
743 | 189 | Ce |
8659 | 152 | Si |
9273 | 144 | Un |
1325 | 140 | Cette |
1871 | 131 | Dans |
2506 | 128 | Elle |
9310 | 127 | Une |
2027 | 124 | De |
6 | 112 | À |
1108 | 102 | C'est |
4031 | 102 | Je |
7376 | 94 | On |
1002 | 84 | Ces |
7529 | 84 | Par |
365 | 81 | Au |
2241 | 81 | Dès |
3747 | 79 | Ils |
39 | 70 | « |
7010 | 61 | Notre |
1107 | 53 | C’est |
9117 | 53 | Tout |
2181 | 52 | Depuis |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV